Content-based data leakage detection using extended fingerprinting

نویسندگان

  • Yuri Shapira
  • Bracha Shapira
  • Asaf Shabtai
چکیده

Protecting sensitive information from unauthorized disclosure is a major concern of every organization. As an organization’s employees need to access such information in order to carry out their daily work, data leakage detection is both an essential and challenging task. Whether caused by malicious intent or an inadvertent mistake, data loss can result in significant damage to the organization. Fingerprinting is a content-based method used for detecting data leakage. In fingerprinting, signatures of known confidential content are extracted and matched with outgoing content in order to detect leakage of sensitive content. Existing fingerprinting methods, however, suffer from two major limitations. First, fingerprinting can be bypassed by rephrasing (or minor modification) of the confidential content, and second, usually the whole content of document is fingerprinted (including non-confidential parts), resulting in false alarms. In this paper we propose an extension to the fingerprinting approach that is based on sorted k-skip-n-grams. The proposed method is able to produce a fingerprint of the core confidential content which ignores non-relevant (non-confidential) sections. In addition, the proposed fingerprint method is more robust to rephrasing and can also be used to detect a previously unseen confidential document and therefore provide better detection of intentional leakage incidents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information–Theoretic Analysis of Privacy Protection for Noisy Identification Based on Soft Fingerprinting

Identification of contents or objects based on some data that are stored/distributed in public domain is required in various applications. At the same time, these data should not reveal any information about original content or object that may be missused in terms of privacy leakage. We consider a privacy protection strategy based on reliable components of data and investigate the performance o...

متن کامل

LEAK DETECTION IN WATER DISTRIBUTION SYSTEM USING NON-LINEAR KALMAN FILTER

Leakage detection in water distribution systems play an important role in storage and management of water resources. Therefore, to reduce water loss in these systems, a method should be introduced that reacts rapidly to such events and determines their occurrence time and location with the least possible error. In this study, in order to determine position and amount of leakage in distribution ...

متن کامل

Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting

With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucit...

متن کامل

Content Leakage Detection by Using Traffic Pattern for Trusted Content Delivery Networks

Multimedia streaming applications and services are becoming popular in recent a year, that’s why issue of trusted video delivery to prevent the undesirable content leakage become critical. The conventional Systems addressed this issue by proposing methods based on observation of streamed traffic throughout the network. The job of maintaining high detection accuracy while coping with traffic var...

متن کامل

Compressed Domain Video Fingerprinting Technique Using The Singular Value Decomposition

A vast amount of video data is generated around the world every day. Fast and efficient storage, indexing, browsing, and retrieval of video are necessary for the development of various multimedia database applications. Video fingerprinting is a proven and commercially available technique that can be used for content based copy detection. Fingerprints are compact content-based signatures that su...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1302.2028  شماره 

صفحات  -

تاریخ انتشار 2013